Search | WHO COVID-19 Research Database

BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration.

Lamsal, Rabindra; Read, Maria Rodriguez; Karunasekera, Shanika.

Data Brief ; 48: 109229, 2023 Jun.

Article in English | MEDLINE | ID: covidwho-2316364

ABSTRACT

The COVID-19 pandemic has introduced new norms, such as social distancing, face masks, quarantine, lockdowns, travel restrictions, work/study from home, and business closures, to name a few. The pandemic's seriousness has made people vocal on social media, especially on microblogs such as Twitter. Since the early days of the outbreak, researchers have been collecting and sharing large-scale datasets of COVID-19 tweets. However, the existing datasets carry issues related to proportion and redundancy. We report that more than 500 million tweet identifiers point to deleted or protected tweets. To address these issues, this paper introduces an enriched global billion-scale English-language COVID-19 tweets dataset, BillionCOV, which contains 1.4 billion tweets originating from 240 countries and territories between October 2019 and April 2022. Importantly, BillionCOV facilitates researchers to filter tweet identifiers for efficient hydration. We anticipate that the dataset of this scale with global scope and extended temporal coverage will aid in obtaining a thorough understanding of the pandemic's conversational dynamics.

Twitter conversations predict the daily confirmed COVID-19 cases.

Lamsal, Rabindra; Harwood, Aaron; Read, Maria Rodriguez.

Appl Soft Comput ; 129: 109603, 2022 Nov.

Article in English | MEDLINE | ID: covidwho-2007455

ABSTRACT

As of writing this paper, COVID-19 (Coronavirus disease 2019) has spread to more than 220 countries and territories. Following the outbreak, the pandemic's seriousness has made people more active on social media, especially on the microblogging platforms such as Twitter and Weibo. The pandemic-specific discourse has remained on-trend on these platforms for months now. Previous studies have confirmed the contributions of such socially generated conversations towards situational awareness of crisis events. The early forecasts of cases are essential to authorities to estimate the requirements of resources needed to cope with the outgrowths of the virus. Therefore, this study attempts to incorporate the public discourse in the design of forecasting models particularly targeted for the steep-hill region of an ongoing wave. We propose a sentiment-involved topic-based latent variables search methodology for designing forecasting models from publicly available Twitter conversations. As a use case, we implement the proposed methodology on Australian COVID-19 daily cases and Twitter conversations generated within the country. Experimental results: (i) show the presence of latent social media variables that Granger-cause the daily COVID-19 confirmed cases, and (ii) confirm that those variables offer additional prediction capability to forecasting models. Further, the results show that the inclusion of social media variables introduces 48.83%-51.38% improvements on RMSE over the baseline models. We also release the large-scale COVID-19 specific geotagged global tweets dataset, MegaGeoCOV, to the public anticipating that the geotagged data of this scale would aid in understanding the conversational dynamics of the pandemic through other spatial and temporal contexts.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL